Close out review debts: perf fix, dedup, latent-trap cleanup, Ladybug migration doc close-out by HumanBean17 · Pull Request #333 · HumanBean17/java-codebase-rag

HumanBean17 · 2026-06-15T21:31:38Z

Applies the four prioritized debts from the six-domain codebase review (no big refactor — the architecture was assessed sound; these are the bounded, high-value cleanups).

Debt 2 — `is_ignored` O(n²) on the index hot path

LayeredIgnore.is_ignored returned (bool, IgnoreLayer) and ran _winning_row (one GitIgnoreSpec rebuild per ignore-rule prefix) on every call. All 7 production callers discarded the layer; only diagnose-ignore needs source attribution, and diagnose_dict already computes it. is_ignored now returns a plain bool; the two tests that checked the layer migrate to diagnose_dict. On a repo with ~100 ignore rules this cuts ~5000 spec compilations per indexed file down to one.

Debt 4 — mechanical duplication (4a–4d)

ast_java: four byte-identical _codebase_*_inner_annotation_nodes walkers → one _inner_annotation_nodes(node, src, target_simple).
graph_enrich: three identical _route/_client/_async _hint_lookup helpers → one generic _hint_lookup (TypeVar).
ladybug_queries: find_callers / find_callees (~50-line near-twins) → one _walk_calls; public signatures preserved.
pr_analysis: drop the dead notes local in compute_risk (never appended; real notes come from analyze_pr_pipeline).

Deferred (4e): the meta() 5-level query cascade in ladybug_queries is not cleanly dead — it is reachable via direct LadybugGraph(...) construction that bypasses get()'s ontology gate, and test_feign_not_exoser relies on its fallback. Left in place; revisit after the migration fully settles.

Debt 3 — latent footguns

mcp_v2: 15 error branches passed hints=[] to NeighborsOutput / DescribeOutput, which have no hints field (real fields advisories / hints_structured default to []). pydantic silently dropped the kwarg — harmless today, but a ValidationError landmine the moment anyone adds extra="forbid". Removed (the models already default to empty).
build_ast_graph: the four brownfield layer names were spelled out 4×. Promoted _BROWNFIELD_LAYERS to the single source of truth; brownfield_strategies is now _BROWNFIELD_LAYERS | {codebase_client, codebase_producer}. Behavior-preserving — the two sets still differ deliberately (layer gate vs. stat counter); that relationship is now explicit instead of two unrelated literals.

Debt 1 — close out the LadybugDB migration

The KuzuDB→LadybugDB migration (#302) landed as code but its doc/string sweep and plan close-out were never finished. Completed here:

Docs: current-state Kuzu→LadybugDB, code_graph.kuzu→.lbug, --kuzu-path→--ladybug-path, kuzu_queries.py→ladybug_queries.py, KuzuGraph→LadybugGraph, kuzu_path→ladybug_path; ontology 15 / 16→17 (code was already 17). Across README, AGENTS, CONFIGURATION, CLI guide, verification checklist, CODEBASE_REQUIREMENTS, AGENT-GUIDE, PRODUCT-VISION, tests/README.
Factual fixes (from the parallel markdown freshness audit): README DECLARES_ROUTE (nonexistent edge)→EXPOSES; role list no longer lists PRODUCER (a node kind) and now includes COMPONENT / CONFIG / ENTITY; EMBEDDING_MODEL→SBERT_MODEL (the real env var). AGENT-GUIDE + SKILL route frameworks corrected to spring_mvc / webflux (kafka/rabbitmq/jms/stream are route kinds; feign is a client kind). PRODUCT-VISION CALLS is shipped, not planned. External citation titles left as Kuzu.
Shipped artifacts: install_data/{skills,agents} explorer copies re-synced from source (were behind — missing source_layer, schema-rejection note). Moved the landed PLAN / propose for LADYBUG-DB-MIGRATE and INDEX-OUTPUT-REWORK from active/ to completed/.
Source: docstring/help-string sweep only (cli/pr_analysis/mcp_v2/search_lancedb, conftest, test docstrings) — no behaviour change; the one stale kuzu 0.11.x version reference in mcp_v2 is genericized.

Verification

.venv/bin/ruff check . — clean.
.venv/bin/python -m pytest tests -q — 838 passed, 13 skipped (skips are the JAVA_CODEBASE_RAG_RUN_HEAVY-gated tests).

Notes

No re-index required — no schema, ontology, or ranking change (ontology stays 17; only docs that wrongly said 15/16 were corrected).
No env-var change — SBERT_MODEL was always the real variable; only the doc that said EMBEDDING_MODEL was fixed.
is_ignored return type changed from tuple[bool, IgnoreLayer|None] to bool (internal API; all callers updated). Breaking change is allowed per repo policy.

🤖 Generated with Claude Code

is_ignored returned (bool, IgnoreLayer) and computed _winning_row (one GitIgnoreSpec rebuild per ignore-rule prefix) on every call. All 7 production callers (iter_java_source_files + java_index_flow_lancedb) discarded the layer; only diagnose-ignore needs source attribution, and diagnose_dict already computes it. is_ignored now returns a plain bool; the two test assertions that checked the layer's source migrate to diagnose_dict. On a repo with ~100 ignore rules this cuts ~5000 spec compilations per indexed file down to one. Co-Authored-By: Claude <noreply@anthropic.com>

- ast_java: four byte-identical _codebase_*_inner_annotation_nodes walkers -> one _inner_annotation_nodes(node, src, target_simple). - graph_enrich: three identical _route/_client/_async _hint_lookup helpers -> one generic _hint_lookup (TypeVar) used at all three call sites. - ladybug_queries: find_callers/find_callees (~50-line near-twins) -> one _walk_calls helper; the two public methods keep their signatures and delegate, differing only in call-graph orientation. - pr_analysis: drop the dead 'notes' local in compute_risk (never appended; the real notes are assembled by analyze_pr_pipeline and merged there). meta()'s 5-level query cascade in ladybug_queries is intentionally left in place: it is reachable via direct LadybugGraph(...) construction that bypasses get()'s ontology gate, and test_feign_not_exoser relies on its fallback, so it is not cleanly dead right after the Ladybug migration. Co-Authored-By: Claude <noreply@anthropic.com>

…terals mcp_v2: 15 error branches passed hints=[] to NeighborsOutput/DescribeOutput, neither of which has a hints field (their fields are advisories / hints_structured, both defaulting to []). pydantic silently dropped the kwarg; today harmless, but the moment anyone adds extra='forbid' to those models every error branch would raise ValidationError swallowed by the catch-all. The models already default to empty, so the dead kwargs are simply removed. build_ast_graph: the four brownfield layer names were spelled out four times (_client_source_layer, _producer_source_layer, brownfield_strategies, and _BROWNFIELD_LAYERS). Promote _BROWNFIELD_LAYERS to the single source of truth and define brownfield_strategies as _BROWNFIELD_LAYERS plus the two caller-side declaration strategies (codebase_client/codebase_producer). The two sets still differ deliberately: _BROWNFIELD_LAYERS gates brownfield_only authoritativeness (per edge), while brownfield_strategies counts annotation-declared callers in the *_from_brownfield_pct stats — now that relationship is explicit instead of two independent literals that looked unrelated. Co-Authored-By: Claude <noreply@anthropic.com>

The KuzuDB->LadybugDB migration (commit #302) landed as code but its doc/string sweep and plan close-out were never finished, leaving operator and agent docs asserting the old store. This completes it: Docs (current-state Kuzu -> LadybugDB, code_graph.kuzu -> .lbug, --kuzu-path -> --ladybug-path, kuzu_queries.py -> ladybug_queries.py, KuzuGraph -> LadybugGraph, kuzu_path -> ladybug_path; ontology 15/16 -> 17): README, AGENTS, docs/CONFIGURATION, docs/JAVA-CODEBASE-RAG-CLI, docs/MANUAL-VERIFICATION-CHECKLIST, docs/CODEBASE_REQUIREMENTS, docs/AGENT-GUIDE, docs/PRODUCT-VISION, tests/README. Factual fixes surfaced by the markdown freshness audit: - README: DECLARES_ROUTE (nonexistent edge) -> EXPOSES; role list no longer lists PRODUCER (a node kind) and now includes COMPONENT/CONFIG/ENTITY; EMBEDDING_MODEL -> SBERT_MODEL (the real env var). - AGENT-GUIDE + SKILL: route frameworks corrected to spring_mvc/webflux (kafka/rabbitmq/jms/stream are route kinds; feign is a client kind). - PRODUCT-VISION: CALLS is shipped, not 'planned'. External citation titles (footnotes 12/17) intentionally left as 'Kuzu'. Shipped-artifact resync + plan close-out: - install_data/{skills,agents} explorer copies re-synced from source (they were behind, missing source_layer and the schema-rejection note). - Moved the landed PLAN/propose for LADYBUG-DB-MIGRATE and INDEX-OUTPUT-REWORK from active/ to completed/. Source docstring/help-string sweep only (cli/pr_analysis/mcp_v2/search_lancedb, conftest, test_ladybug_queries docstrings) — no behaviour change; the one clearly-stale kuzu 0.11.x version reference in mcp_v2 is genericized. Co-Authored-By: Claude <noreply@anthropic.com>

HumanBean17 and others added 5 commits June 15, 2026 22:51

Bump version to 0.6.3

838d00b

HumanBean17 merged commit a08f6bc into master Jun 17, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Close out review debts: perf fix, dedup, latent-trap cleanup, Ladybug migration doc close-out#333

Close out review debts: perf fix, dedup, latent-trap cleanup, Ladybug migration doc close-out#333
HumanBean17 merged 5 commits into
masterfrom
chore/review-debts

HumanBean17 commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HumanBean17 commented Jun 15, 2026

Debt 2 — is_ignored O(n²) on the index hot path

Debt 4 — mechanical duplication (4a–4d)

Debt 3 — latent footguns

Debt 1 — close out the LadybugDB migration

Verification

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Debt 2 — `is_ignored` O(n²) on the index hot path